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WO 99/33274 PCT/US98/26984 
SCALABLE PREDICTIVE CODING METHOD AND APPARATUS 



BACKGROUND OF THE INVENTION 

1. Field of the Invention 

5 This invention pertains generally to data compression methods and 

systems, and more particularly to an efficient scalable predictive coding method 
and system where most or all of the information available to the enhancement- 
layer is exploited to improve the quality of the prediction. 

2. Description of the Background Art 

10 The following publications which are referenced herein using numbers in 

square brackets (e.g., [1]) are incorporated herein by reference: 

[1] D. Wilson and M. Ghanbari, Transmission of SNR scalable two layer 

MPEG-2 coded video through ATM networks," Proa 7th International Workshop 

on Packet Video, pp. 185-189, Brisbane Australia, March 1996. 
15 [2] B. Girod, U. Horn, and B. Belzer, "Scalable video coding with 

multiscale motion compensation and unequal error protection," In Y. Wang, S. 

Panwar, S.-P. Kim, and H. L. Bertoni, editors, Multimedia Communications and 

Video Coding, pp. 475-482, New York: Plenum Press, 1996. 

[3] B. G. Haskell, A. Puri, and A. N. Netravali, Digital video: an 
20 introduction to MPEG-2. New York: Chapman and Hall, International Thomson 

Pub., 1997. 

[4] Draft text ofH.263, Version 2 (H.263+). 

[5] T. K. Tan, K. K. Pang, and K. N. Ngan, "A frequency scalable coding 
scheme employing pyramid and subband techniques," IEEE Transactions on 
25 Circuits and Systems for Video Technology, pp. 203-207, April 1994. 

[6] A. Gersho and R. M. Gray, Vector Quantization and Signal 
Compression. Kluwer Academic Press, 1992. 

Many applications require data, such as video, to be simultaneously 
decodable at a variety of rates. Examples include applications involving broadcast 
30 over differing channels, multicast in a complex network where the channels/links 
dictate the feasible bit rate for each user, the co-existence of receivers of different 
complexity (and cost), and time-varying channels. An associated compression 
technique is "scalable" if it offers a variety of decoding rates using the same basic 
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algorithm, and where the lower rate information streams are embedded in the 
higher rate bit-streams in a manner that minimizes redundancy. 

A predictive coding system for encoding and decoding a signal without 
scalability is well-known in the literature of signal compression. (See for example: 

5 predictive vector quantization [6], and motion-compensated predictive transform 
coding of video [3]). In such predictive coding systems the encoder includes a 
decoder and memory so that what is actually encoded is the difference between 
the input signal and a predicted version of the reproduced signal, this difference 
signal being called the residual. The decoder contains a prediction loop whereby 

10 the current residual frame is decoded and then it is added to a prediction of the 
current frame obtained from the previous reproduced frame. In some cases, the 
predictor uses several prior frames to predict the current frame. 

A major difficulty encountered in scalable predictive coding is how to take 
advantage of the additional information, available to the enhancement-layer 

15 decoder for improved prediction, without causing undesired conflicts with the 
information obtained from the base layer. FIG. 1 depicts a two-layer scalable 
coding system 10 where it is assumed that the original input signal (e.g., an audio 
or video signal) is segmented into frames that are sequentially encoded. Typical 
examples are video frames, and speech frames, but "frame" here will also cover 

20 the degenerate case of a single sample as in differential pulse coded modulation 
(DPCM). The term "frame" as used herein refers either to a group of contiguous 
samples of an original input signal or a set of parameters extracted from the 
original group of samples (such as a set of transform coefficients obtained by a 
discrete-cosine transform (DCT) operation on the original group of samples) and in 

25 each case the terminology "frame" or "signal" will be used to refer to this entity that 
is representative of the original group of samples or is itself the original group of 
samples. 

The input frame 12, x{n) , is compressed by the base encoder (BE) 14 
which produces the base bit-stream 16. The enhancement-layer encoder (EE) 18 
30 has access to the input frame 12 and to any information produced by or available 
to BE 14. EE 18 uses this data to generate the enhancement-layer bit-stream 20. 
A base decoder (BD) 22 receives the base bit-stream 16 and produces a 
reconstruction 24, x b {ri) , while the enhancement-layer decoder (ED) 26 has 
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access to both bit-streams and produces an enhanced reconstruction 28, x e (n) . 
The reconstruction frames that are available at the decoder are used to predict or 
estimate the current frame. Note that ED 26 has access to both bit streams and 
hence it effectively has access to both the reconstruction frame at the base layer, 

5 x b {n) , and the previous reconstructed frame at the enhancement layer x e (n - 1) , 
while BD 22 has only access to the previous reconstructed frame at the base 
layer, x b (n - 1) , which is stored in the memory within BD. In the case of a scalable 
coding system with multiple enhancement layers, an enhancement layer decoder 
may have access to the reconstruction frames from lower enhancement layers as 

10 well as from the base layer. The prediction loop (internal to the operation of BD as 
in any predictive coding system but not shown in the figure) in this configuration 
causes severe difficulties in the design of scalable coding. Accordingly, a number 
of approaches to scalable coding have been developed. These include, 

(1 ) The standard approach: At the base layer, BE 14 compresses the 

15 residual r b {n) = P[x b (n - 1)] , where P denotes the predictor (e.g., motion 
compensator in the case of video coding). Note that for notational simplicity we 
assume first-order prediction, but in general several previous frames may be used. 
BD 22 produces the reconstruction x b (n) = P[x b (n -l)] + r b (n) , where r b (n) is the 
compressed-reconstructed residual. At the enhancement-layer, EE 18 

20 compresses the base layer's reconstruction error 

r e X) = x(n) - x b (n) = x(n) - P[x b (n - 1)] - P b (n) . The enhancement-layer reconstruction 
is x e (n)^x b (n)^r e il \n)^I\x b (n-l)] + r^ e.g., [1]. A deficiency of 

this approach is that no advantage is taken of the potentially superior prediction 
due to the availability of x e (n-l) at the ED 26. 

25 (2) The separate coding approach: BE 14 compresses r b (n) as above, 

but EE 18 compresses the "enhancement-only" prediction error 
r c (2) =x(n)-P[x e (n-l)] directly. The enhancement-layer reconstruction is 

* e («) = P[x,{n -1)]+/; (2) (h) . A deficiency of this approach is that, while the 
approach takes advantage of information available only to the enhancement-layer, 
30 it does not exploit the knowledge of r b (n) which is also available at the 

enhancement-layer. The two layers are, in fact, separately encoded except for 
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savings on overhead information which needs not be repeated (such as motion 
vectors in video coding) [2], 

(3) Layer-specific prediction at the decoder approach: BD 22 
reconstructs the frame as x b (n) = P[x b (n-l)] + ? b (n), and ED 26 reconstructs as 
5 x e (n)- P[x e (n-l)] + f b + However, the encoders BE 14 and EE 18 use the 
same prediction [3], and the options are: 

(a) Both encoders use base-layer prediction . This 
results in drift of the enhancement-layer decoder. (The term "drift" refers to 
a form of mismatch where the decoder uses a different prediction than the 

io one assumed by the encoder. This mismatch tends to grow as the 

"corrections" provided by the encoder are misguiding, hence, the decoder 
"drifts away"). 

(b) Both encoders use enhancement-layer prediction P[x e (n-l)] . 
This results in drift of the base-layer decoder. 

is (4) Switch between approaches (1 ) and (2) on a per frame or per block 

basis [4], or per sample [5]. This approach has the deficiencies of either approach 
(1 ) or (2) as described above, at each time depending on the switching decision. 

BRIEF SUMMARY OF THE INVENTION 
20 The present invention addresses the prediction loop deficiencies in 

conventional scalable coding methods and systems in a way that achieves 
efficient scalability of predictive coding. The approach is generally applicable and 
may, in particular, be applied to standard video and audio compression, in the 
present invention, most or all of the information available at an enhancement-layer 
25 may be exploited to improve the quality of the prediction. 

By way of example, and not of limitation, in the present invention the 
current frame is predicted at the enhancement-layer by processing and combining 
the reconstructed signal representing: (i) the current base-layer (or lower layers) 
frame; and (ii) the previous enhancement-layer frame. The combining rule takes 
30 into account the compressed prediction error of the base-layer, and the 
parameters used for its compression. The main difficulty overcome by this 
invention is in the apparent conflicts between these two sources of information 
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and their impact as described in the Background of the Invention. This difficulty 
may explain why existing known methods exclusively use one of these 
information sources at any given time. These methods will be generally referred 
to here as switching techniques (which include as a special case the exclusive use 

5 of one of the information sources at all times). Additionally, the invention 
optionally includes a special enhancement-layer synchronization mode for the 
case where the communication rate for a given receiver is time varying (e.g., in 
mobile communications). This mode may be applied periodically to allow the 
receiver to upgrade to enhancement-layer performance even though it does not 

10 have prior enhancement-layer reconstructed frames. 

An object of the invention is to achieve efficient scalability of predictive 
coding. 

Another object of the invention is to provide a method and system for 
scalable predictive coding that is applicable to typical or standard video and audio 
is compression. 

Another object of the invention is to provide a scalable predictive coding 
method and system in which all or most of the information available at an 
enhancement-layer is exploited to improve the quality of the prediction. 

Further objects and advantages of the invention will be brought out in the 
20 following portions of the specification, wherein the detailed description is for the 
purpose of fully disclosing preferred embodiments of the invention without placing 
limitations thereon. 

BRIEF DESCRIPTION OF THE DRAWINGS 
25 The invention will be more fully understood by reference to the following 

drawings which are for illustrative purposes only: 

FIG. 1 is functional block diagram of a conventional two-layer scalable 
predicting coding system. 

FIG. 2 is a functional block diagram of an enhancement layer encoder of a 
30 scalable predictive coding system in accordance with the present invention. 

FIG. 3 is a functional block diagram of a base layer reconstruction module 
according to the present invention. 
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FIG. 4 is a functional block diagram of an enhancement layer reconstruction 
module according to the present invention. 

FIG. 5 is a functional block diagram of a three-layer scalable encoder 
employing the enhancement encoder of the present invention. 
5 FIG. 6 is a functional block diagram of a three-layer scalable decoder 

corresponding to the encoder shown in FIG. 5. 

FIG. 7 is a functional block diagram of a two-layer scalable video encoder 
employing the enhancement encoder of the present invention. 

FIG. 8 is a functional block diagram of a two-layer decoder corresponding 
10 to the encoder shown in FIG. 7. 

FIG. 9 is a functional block diagram of the spatial motion compensator 
blocks shown in FIG. 7 and FIG. 8. 



DETAILED DESCRIPTION OF THE INVENTION 

15 Referring more specifically to the drawings, where like reference numbers, 

labels and symbols denote like parts, for illustrative purposes the present invention 
will be described with reference to the encoder generally shown in FIG. 2, as well 
as the encoding system shown in FIG. 2 through FIG. 6, and the scalable 
predictive coding method described in connection therewith. Various 

20 embodiments of encoders and decoders employing the present invention, and 
details therefore, are shown and described in FIG. 7 through FIG. 9. 

The method of the present invention generally comprises upgrading the 
prediction used at each enhancement-layer by combining, with minimal conflict, 
the information provided from both sources, namely, information available at, and 

25 used by, the base-layer (or lower layers), and information that is available only at 
the enhancement-layer. In the case of a scalable predictive coding system with 
multiple enhancement layers, the prediction at an enhancement layer may 
combine information provided from all lower enhancement layers as well. The 
invention provides for prediction or estimation of the signal frame itself in any 

30 representation, or any subset of signal representation coefficients such as 
transform coefficients (e.g., in video, audio), line spectral frequencies (e.g., in 
speech or audio), etc. The term "frame" and the corresponding mathematical 
notation will be used generally to refer to the relevant set of frame coefficients 
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being estimated or predicted by the method in each particular application. 

Referring first to FIG. 2, a functional block diagram of an enhancement 
layer encoder of a scalable predictive coding system in accordance with the 
present invention is shown. In the enhancement layer encoder 100 of the present 

5 invention, an enhancement layer estimator (ELE) 102 computes a new predicted 
frame 104, x e (n) , by combining information from the reconstruction frame 106 at 
the base layer, x b (n) and from the previous reconstructed frame 108 at the 
enhancement layer x e (n - 1) . Note that first order prediction is described for 
notational simplicity but several previous frames may be used. The combining 

10 rule depends on any or all of, but not limited to, the following parameters: the 
compression parameters 110 of the base layer (such as quantization step and 
threshold, and the quantized base-layer residual 112, f b (n) , (see FIG. 3)), and the 
statistical parameters 114 of the time evolution of the frames (such as inter-frame 
correlation coefficients and variance). The statistical parameters may be either 

15 estimated off-line from training data, or estimated on-line by an adaptive estimator 
which tracks variation in the signal statistics based on either the original signal (in 
which case the parameters need to be transmitted to the decoder) or based on 
reconstructed signals which are available to the receiver. The exact definition of 
the combination rule depends on the level of complexity allowed for the module. 

20 At the high end, one may compute a possibly complex, optimal predicted frame 
given all the available information. The enhancement layer residual 116, r e (n) , 
which is the difference between the input frame 118, x(n) , and the predicted 
frame 104, x e (n) , is then compressed by a compressor 120 to produce the 
enhancement bits 122. 

25 Referring to FIG. 3 through FIG. 6, a complete scalable predictive coding 

system for use with this invention is shown. While only three layers are shown, it 
will be appreciated that additional layers can be added and are contemplated 
within the scope of the invention. FIG. 3 shows a base layer reconstruction 
module 124 which receives the quantized base layer residual 112, r b (n) , and adds 

30 it to the base predicted frame 126, x b (n) , to produce the base layer reconstruction 
frame 106, x b (n) . A delay 128 produces a delayed base reconstructed frame 130, 
x b (n-l) , which is input to the base predictor 132 which computes the base 



-7- 



WO 99/33274 



PCT/US98/26984 



predicted frame 126, x b (n) , which is needed to produce the reconstructed frame 
as explained above. 

The enhancement layer reconstruction module 134 shown in FIG. 4 
receives the quantized enhancement layer residual 136, f e (n) , and adds it to the 

5 enhancement layer predicted frame 104, x e (n) , to produce the enhancement layer 
reconstruction frame 138, x e (n) . A delay 140 produces a delayed enhancement 
layer reconstructed frame 108, x e (n-l) , which is input to the enhancement layer 
estimator 102, which in turn computes the enhancement layer predicted frame 
104, x e (n) , as explained with reference to FIG. 2. 

io FIG. 5 shows how the modules described in FIG. 2 through FIG. 4 may be 

combined to obtain a complete scalable predictive encoder. Only three layers are 
shown without implying any limitation, as extensions to further layers is obvious 
and straightforward. Most inputs and outputs were explained in the context of the 
previous figures, and to distinguish between the notation for the first and second 

is enhancement layer signals, the prefix EL1 or EL2 was added, respectively. 

The signal frame to be compressed (which may be the original raw signal, 
or any set of coefficients extracted from it for the purpose of compression) 
denoted x(n) is fed to all layers in parallel. In each layer the predicted frame 
(x b (n) in the base layer, (EL1)x € (n) in the first enhancement layer, and (EL2) 

20 x e (n) at the second enhancement layer) is subtracted from x(n) to obtain the 
prediction error (or residual) at the layer (r b (n) , (EL1) r e (n) , and (EL2) r e {n) , for 
the base, first enhancement and second enhancement layers, respectively). The 
residual is compressed by the layer's Compressor/Quantizer which outputs: the 
layer's bits for transmission to the decoder, the reconstructed (quantized) residual 

25 (r b {n) , (EL1 ) r e (n) , and (EL2) r e (n) , for the base, first enhancement and second 
enhancement layers, respectively), as input to the layer's reconstruction module, 
and the set of compression parameters for use by a higher layer. Note that the 
enhancement layer compressor/quantizer subsumes the compressor 120 of FIG. 2 
as, beside the bit stream, it also outputs the quantized residual. The 

30 reconstruction module of each layer processes its input signals as per Figures 3 
and 4, and outputs the reconstructed frame for the layer ( x b (n) , (EL1 ) x e (n) , and 
(EL2) x e (n) , for the base, first enhancement and second enhancement layers, 
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respectively), and the layer's predicted frame (x b (n) , (EL1) x e (n) , and (EL2) x e (n) , 
for the base, first enhancement and second enhancement layers, respectively). 

The corresponding three layer scalable predictive decoder is shown in 
FIG. 6. Each layer's inverse compressor/quantizer receives as input the layer's bit 
5 stream from which it reproduces the layer's quantized residual. It also extracts the 
layer's compression parameters for use by a higher layer reconstruction module. 
The rest of the diagram is identical to the encoder of FIG. 2 and similarly produces 
the reconstructed frame at each layer. 



10 coding and, in particular, may be applied to known vector quantizer-based 
compression techniques, and known fra/?s/bm?-based techniques. Further, it is 
applicable to compression of speech, audio, and wdeo signals. A combining rule 
employing optimal estimation for scalable compression is described next as an 
implementation example of the invention. 

15 In typical predictive coding, a number of signal representation coefficients 

(e.g., vectors of transform coefficients, line spectral frequencies, or vectors of raw 
signal samples) are extracted per frame and quantized independently. A specific 
low complexity implementation of the invention consists of optimally combining the 
information available for predicting the coefficient at an enhancement-layer. The 

20 reconstructed coefficient at the base-layer, x b (ri) , and the quantization interval (or 
partition region in the case of vector quantization) of the corresponding 
reconstructed residual f b {n) , determine an interval/cell I(n) within which the 
original coefficient x(n) must lie. From the corresponding reconstructed 
coefficient at the previous enhancement-layer frame, x e (n-l) , and a statistical 

25 model on time evolution of the coefficients, one may construct a probability density 
function for x(n) conditional on x e (n-l) , denoted by p[x(n)\x e (n-l)] . The optimal 
estimate of x(n) is obtained by expectation: 



This predictor incorporates the information provided by the base-layer (interval 
30 within which x(n) lies), and by the enhancement-layer (probability distribution of 



It will be appreciated that the invention is generally applicable to predictive 




jp[x(n)\x e (n-l)]dx 
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x{ri) conditional on x e (n-l)). 

Referring now to FIG. 7 and FIG. 8, a system for scalable predictive 
transform coding which is designed for the compression of video signals is shown. 

5 In current practice and standards (e.g., [4]), the system uses motion 

compensation for basic frame prediction, applies the discrete cosine transform 
(DCT) to the prediction error (residual), and quantizes the transform coefficients 
one at a time. A block diagram of a two-layer scalable video encoder is shown in 
FIG. 7, and the corresponding decoder is shown in FIG. 8. FIG. 9 shows a 

10 functional block diagram corresponding to the spatial motion compensator blocks 
shown in the base layer and the enhancement later. 

Note that, for simplicity, the symbols x 9 r,x,r,x for the video and residual 
signals at the base and enhancement layers in the diagram are in the transform 
domain, even though motion compensation is performed in the spatial domain 

15 (FIG. 9). Note further that additional enhancement layers may be added where an 
enhancement layer k builds on and relates to layer k-1 below it exactly as shown 
for the first two enhancement layers. 

The first-order Laplace-Markov process was chosen for modeling the time 
evolution statistics of the video signal: 

20 x{ri) = p MC[x(n - 1)] + z(n) , 

where x(n) is the DCT coefficient in the current frame and WC[xfn-1 )] is the 
corresponding (after motion compensation) coefficient in the previous frame. The 
correlation coefficient p is assumed to be nearly one. As x(n) has a Laplacian 
density, the driving process, z(n), is zero-mean, white, stationary, and has the 

25 density 

p(z) = p 2 S(z) + (l-p 2 )^e-°*. 

(Both a and p may in practice be estimated "offline" from training data, or via an 
adaptive estimator that tracks variations in local statistics of the signal). The base 
layer performs standard video compression; its predictor consists only of motion 
30 compensation, x h (n) = MC[x b (n-l)] , the residual r b (n) = x(n)-x b (n) is quantized 
and the corresponding index is transmitted. Let [a 9 b] be the quantization interval, 
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hence r b (n) e[a,b] . Thus the information the base layer provides on x(n) is 
captured in the statement: 



At the enhancement layer, the prediction exploits the information available from 
both layers. The optimal predictor is given therefore by the expectation: 

x e (n) = E{x(n)\x € (n-l\x(n) e [x b («) + a, x b («) + &]} , 
which is conveniently rewritten as 

x e (n) = x e (n-\) + E{z(n)\z(n) sl 2 (n)} 



x e (n-l) = MC[x e (n-l)] 

and the expectation interval is 

L («) = K («) + fl - x e {n - \\x b (n) + b- x e (n - 1)] . 
This prediction is directly implemented using the model for p(z) given above: 



The integral may be analytically evaluated and its closed form solution given 
explicitly in terms of the integral limits and the parameters a,fi,is normally used 
for simple implementation. 

This embodiment of the invention is of low complexity, uses standard video 
compression for its base layer, and provides substantial performance gains which 
build up and increase with the number of layers implemented. Its absence in all 
leading standards in spite of its gains and low complexity strongly suggests that 
the invention is not obvious to the leading researchers and developers in the field 
of video compression. 

The scalable predictive coding method of the invention, although illustrated 
herein on a two or three-layer scalable system, is repeatedly applicable to further 
layers of enhancement in a straightforward manner. For example, at layer k we 
combine signal Information from the current reconstructed frame at layer k-1, and 
from the previous reconstruction frame at layer /c. A higher complexity version 
allows for the combining rule to take into account data from all lower layers. In the 
special implementation described, information from all lower layers contributes to 



x(n) e [x b (n) + a,x b (n) + b]. 



where 



jzp(z)dz 
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restricting the final interval within which the coefficient must lie. Another higher 
complexity version uses higher order prediction (based on multiple past frames). 

Another application of the invention pertains to time-varying channels, such 
as mobile communications, and most common network communications. When 
the receiver experiences an improvement in channel conditions, it attempts to 
decode higher enhancement bits and improve the quality of the reconstruction. 
However, it can not compute the enhancement layer prediction as past 
enhancement layer reconstruction frames were not decoded and are not available. 
The present invention includes a solution to this problem, which comprises 
periodically (e.g., once per fixed number of frames) constraining the enhancement 
encoder to exclusively use lower layer information for the prediction. This periodic 
constrained prediction synchronizes the enhancement decoder with the 
enhancement encoder and allows the receiver to decode the enhancement-layer 
signals. The frequency of application of this constrained mode may be different 
for each layer and may be optimized for the time-varying channel statistics. The 
trade off is between some temporary degradation in prediction (when the 
prediction is constrained) and the receiver's capability to upgrade to enhancement 
layer performance as the channel conditions improve. 

Finally, it will be appreciated that the scalability advantages of the invention 
may be easily combined with known methods for temporal and spatial scalability. 

Accordingly, it will be seen that this invention provides for efficient 
scalability of predictive coding that is applicable to standard video and audio 
compression. The invention uses most or all of the information available at an 
enhancement-layer to improve the quality of the prediction. In addition, the 
invention provides for enhancement-layer synchronization to accommodate 
situations where the communication rate for a given receiver is time varying (e.g., 
in mobile communications). Although the description above contains many 
specificities, these should not be construed as limiting the scope of the invention 
but as merely providing illustrations of some of the presently preferred 
embodiments of this invention. Thus the scope of this invention should be 
determined by the appended claims and their legal equivalents. 
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CLAIMS 

What is claimed is: 

1 . A method for predicting the current frame of data in a digital coding 
system wherein a signal is segmented into frames of data that are sequentially 
5 encoded, said system including a base layer and an enhancement layer, said 

base layer including a base encoder and a base decoder, said enhancement layer 
including an enhancement encoder and an enhancement decoder, said base 
decoder producing a reconstructed signal, said enhancement decoder producing 
an enhanced reconstructed signal, said method comprising the steps of: 
10 predicting the current frame of data at the enhancement-layer by 

processing and combining the reconstructed data representing the current base 
layer frame and the reconstructed data representing the previous enhancement 
layer frame. 

15 2. A method for scalable predictive coding of a signal, comprising the 

steps of: 

(a) encoding data representing said signal with a base layer predictive 
coding system that provides a first prediction of said signal and information 
indicative of a decoded base layer approximation to said signal; 
20 (b) encoding data representing said signal by a first enhancement layer 

which performs predictive coding with a second prediction of said signal derived 
from a combination of information from the base layer and information indicative of 
the past decoded signal approximation generated in said first enhancement layer. 

25 3. A method as recited in claim 2, wherein the step of encoding said 

signal data with said enhancement layer comprises the steps of providing to said 
first enhancement layer compression parameters from the base layer to aid in the 
computation of said second prediction. 

30 4. A method as recited in claim 2, wherein the step of encoding said 

signal data with said first enhancement layer comprises the steps of providing to 
said first enhancement layer time evolution statistics derived either by off-line 
computation or by computations using quantized parameters of said signal. 

-13- 



WO 99/33274 



PCT/US98/26984 



5. A method as recited in claim 2, wherein said coding system includes a 
second enhancement layer and wherein said second enhancement layer performs 
predictive coding with a third prediction of said signal derived from a combination 
of information from said first enhancement layer and information indicative of the 

5 past decoded signal approximation generated in said second enhancement layer. 

6. A method as recited in claim 2, wherein said second prediction at 
predetermined intervals is derived exclusively from information from the base layer 
and at all other times is derived by combining information from the base layer and 

io information indicative of the past decoded signal approximation generated in said 
first enhancement layer. 

7. An apparatus for predicting the current frame of data in a digital 
coding system wherein a signal is segmented into frames of data that are 
sequentially encoded, said system including a base layer and an enhancement 

is layer, said base layer including a base encoder and a base decoder, said 
enhancement layer including an enhancement encoder and an enhancement 
decoder, said base decoder producing a reconstructed signal, said enhancement 
decoder producing an enhanced reconstructed signal, comprising: 

means for predicting the current frame of data at the enhancement-layer by 

20 processing and combining the reconstructed data representing the current base 
layer frame and the reconstructed data representing the previous enhancement 
layer frame. 

8. An apparatus for scalable predictive coding of a signal, comprising: 
25 (a) means for encoding data representing said signal with a base layer 

predictive coding system that provides a first prediction of said signal and 
information indicative of a decoded base layer approximation to said signal; 

(b) means for encoding data representing said signal by a first 
enhancement layer which performs predictive coding with a second prediction of 
30 said signal derived from a combination of information from the base layer and 
information indicative of the past decoded signal approximation generated in said 
first enhancement layer. 
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9. An apparatus as recited in claim 8, wherein said means for encoding 
said signal data with said enhancement layer comprises means for providing to 
said first enhancement layer compression parameters from the base layer to aid in 
the computation of said second prediction. 

5 

10. An apparatus as recited in claim 8, wherein said means for encoding 
said signal data with said first enhancement layer comprises means for providing 
to said first enhancement layer time evolution statistics derived either by off-line 
computation or by computations using quantized parameters of said signal 

10 

11. An apparatus as recited in claim 8, further comprising a second 
enhancement layer, wherein said second enhancement layer performs predictive 
coding with a third prediction of said signal derived from a combination of 
information from said first enhancement layer and information indicative of the 

is past decoded signal approximation generated in said second enhancement layer. 

12. An apparatus as recited in claim 8, wherein said second prediction at 
predetermined intervals is derived exclusively from information from the base layer 
and at all other times is derived by combining information from the base layer and 

20 information indicative of the past decoded signal approximation generated in said 
first enhancement layer. 

13. A scalable predictive coding system for compressing a signal, 
comprising at least one enhancement layer and at least one lower layer, wherein 

25 prediction in an enhancement layer combines information from a lower layer with 
information from the enhancement layer. 

14. A scalable predictive coding method for compressing a signal in 
system comprising at least one enhancement layer and at least one lower layer, 

30 the method comprising the steps of performing prediction in an enhancement layer 
by combining information from a lower layer with information from the 
enhancement layer. 
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